Overview

Dataset statistics

Number of variables24
Number of observations51290
Missing cells41296
Missing cells (%)3.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.4 MiB
Average record size in memory192.0 B

Variable types

CAT15
NUM7
DATE2

Warnings

Order ID has a high cardinality: 25728 distinct values High cardinality
Customer ID has a high cardinality: 17415 distinct values High cardinality
Customer Name has a high cardinality: 796 distinct values High cardinality
City has a high cardinality: 3650 distinct values High cardinality
State has a high cardinality: 1102 distinct values High cardinality
Country has a high cardinality: 165 distinct values High cardinality
Product ID has a high cardinality: 3788 distinct values High cardinality
Product Name has a high cardinality: 3788 distinct values High cardinality
Market is highly correlated with RegionHigh correlation
Region is highly correlated with MarketHigh correlation
Sub-Category is highly correlated with CategoryHigh correlation
Category is highly correlated with Sub-CategoryHigh correlation
Postal Code has 41296 (80.5%) missing values Missing
Row ID has unique values Unique
Discount has 29009 (56.6%) zeros Zeros
Profit has 668 (1.3%) zeros Zeros

Reproduction

Analysis started2021-02-20 07:00:16.909718
Analysis finished2021-02-20 07:00:31.200671
Duration14.29 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

Row ID
Real number (ℝ≥0)

UNIQUE

Distinct51290
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25645.5
Minimum1
Maximum51290
Zeros0
Zeros (%)0.0%
Memory size400.7 KiB
2021-02-20T15:00:31.275444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2565.45
Q112823.25
median25645.5
Q338467.75
95-th percentile48725.55
Maximum51290
Range51289
Interquartile range (IQR)25644.5

Descriptive statistics

Standard deviation14806.29199
Coefficient of variation (CV)0.577344641
Kurtosis-1.2
Mean25645.5
Median Absolute Deviation (MAD)12822.5
Skewness1.802083994 × 1017
Sum1315357695
Variance219226282.5
MonotocityNot monotonic
2021-02-20T15:00:31.391166image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
354601
 
< 0.1%
6611
 
< 0.1%
27081
 
< 0.1%
129471
 
< 0.1%
149941
 
< 0.1%
88491
 
< 0.1%
108961
 
< 0.1%
498051
 
< 0.1%
375111
 
< 0.1%
Other values (51280)51280
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
ValueCountFrequency (%)
512901
< 0.1%
512891
< 0.1%
512881
< 0.1%
512871
< 0.1%
512861
< 0.1%

Order ID
Categorical

HIGH CARDINALITY

Distinct25728
Distinct (%)50.2%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
CA-2015-SV20365140-42268
 
14
TO-2015-AB600131-42299
 
13
IN-2013-TB21055113-41562
 
13
MX-2015-PO1885082-42279
 
13
IN-2014-MH1778527-41697
 
13
Other values (25723)
51224 

Length

Max length24
Median length23
Mean length23.11403782
Min length20

Characters and Unicode

Total characters1185519
Distinct characters40
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12935 ?
Unique (%)25.2%

Sample

1st rowCA-2014-AB10015140-41954
2nd rowIN-2014-JR162107-41675
3rd rowIN-2014-CR127307-41929
4th rowES-2014-KM1637548-41667
5th rowSG-2014-RH9495111-41948
ValueCountFrequency (%)
CA-2015-SV20365140-4226814
 
< 0.1%
TO-2015-AB600131-4229913
 
< 0.1%
IN-2013-TB21055113-4156213
 
< 0.1%
MX-2015-PO1885082-4227913
 
< 0.1%
IN-2014-MH1778527-4169713
 
< 0.1%
NI-2015-TC1098095-4203313
 
< 0.1%
IN-2015-BE1133527-4227612
 
< 0.1%
CA-2015-AC10615140-4225012
 
< 0.1%
IN-2012-SP2062011-4126212
 
< 0.1%
MX-2014-RA1988551-4179512
 
< 0.1%
Other values (25718)51163
99.8%
2021-02-20T15:00:31.771117image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca-2015-sv20365140-4226814
 
< 0.1%
ni-2015-tc1098095-4203313
 
< 0.1%
in-2013-tb21055113-4156213
 
< 0.1%
in-2014-mh1778527-4169713
 
< 0.1%
mx-2015-po1885082-4227913
 
< 0.1%
to-2015-ab600131-4229913
 
< 0.1%
mx-2014-jk1537082-4190712
 
< 0.1%
in-2015-be1133527-4227612
 
< 0.1%
in-2012-sp2062011-4126212
 
< 0.1%
ca-2015-ac10615140-4225012
 
< 0.1%
Other values (25718)51163
99.8%

Most occurring characters

ValueCountFrequency (%)
1180858
15.3%
-153870
13.0%
2127701
10.8%
0125028
10.5%
4112321
9.5%
581447
 
6.9%
352327
 
4.4%
840586
 
3.4%
938522
 
3.2%
734486
 
2.9%
Other values (30)238373
20.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number826489
69.7%
Uppercase Letter204953
 
17.3%
Dash Punctuation153870
 
13.0%
Lowercase Letter207
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
S22610
 
11.0%
C18031
 
8.8%
M17510
 
8.5%
A15726
 
7.7%
I15561
 
7.6%
E11955
 
5.8%
N11166
 
5.4%
D9944
 
4.9%
B8607
 
4.2%
T8484
 
4.1%
Other values (16)65359
31.9%
ValueCountFrequency (%)
1180858
21.9%
2127701
15.5%
0125028
15.1%
4112321
13.6%
581447
9.9%
352327
 
6.3%
840586
 
4.9%
938522
 
4.7%
734486
 
4.2%
633213
 
4.0%
ValueCountFrequency (%)
p81
39.1%
o68
32.9%
l58
28.0%
ValueCountFrequency (%)
-153870
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common980359
82.7%
Latin205160
 
17.3%

Most frequent character per script

ValueCountFrequency (%)
S22610
 
11.0%
C18031
 
8.8%
M17510
 
8.5%
A15726
 
7.7%
I15561
 
7.6%
E11955
 
5.8%
N11166
 
5.4%
D9944
 
4.8%
B8607
 
4.2%
T8484
 
4.1%
Other values (19)65566
32.0%
ValueCountFrequency (%)
1180858
18.4%
-153870
15.7%
2127701
13.0%
0125028
12.8%
4112321
11.5%
581447
8.3%
352327
 
5.3%
840586
 
4.1%
938522
 
3.9%
734486
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1185519
100.0%

Most frequent character per block

ValueCountFrequency (%)
1180858
15.3%
-153870
13.0%
2127701
10.8%
0125028
10.5%
4112321
9.5%
581447
 
6.9%
352327
 
4.4%
840586
 
3.4%
938522
 
3.2%
734486
 
2.9%
Other values (30)238373
20.1%
Distinct1430
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Minimum2012-01-01 00:00:00
Maximum2015-12-31 00:00:00
2021-02-20T15:00:31.879827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:32.000532image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1464
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Minimum2012-01-03 00:00:00
Maximum2016-01-07 00:00:00
2021-02-20T15:00:32.117192image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:32.237903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Ship Mode
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Standard Class
30775 
Second Class
10309 
First Class
7505 
Same Day
 
2701

Length

Max length14
Median length14
Mean length12.84306882
Min length8

Characters and Unicode

Total characters658721
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFirst Class
2nd rowSecond Class
3rd rowFirst Class
4th rowFirst Class
5th rowSame Day
ValueCountFrequency (%)
Standard Class30775
60.0%
Second Class10309
 
20.1%
First Class7505
 
14.6%
Same Day2701
 
5.3%
2021-02-20T15:00:32.459276image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-20T15:00:32.529117image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
class48589
47.4%
standard30775
30.0%
second10309
 
10.0%
first7505
 
7.3%
day2701
 
2.6%
same2701
 
2.6%

Most occurring characters

ValueCountFrequency (%)
a115541
17.5%
s104683
15.9%
d71859
10.9%
51290
7.8%
C48589
7.4%
l48589
7.4%
S43785
 
6.6%
n41084
 
6.2%
r38280
 
5.8%
t38280
 
5.8%
Other values (8)56741
8.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter504851
76.6%
Uppercase Letter102580
 
15.6%
Space Separator51290
 
7.8%

Most frequent character per category

ValueCountFrequency (%)
a115541
22.9%
s104683
20.7%
d71859
14.2%
l48589
9.6%
n41084
 
8.1%
r38280
 
7.6%
t38280
 
7.6%
e13010
 
2.6%
c10309
 
2.0%
o10309
 
2.0%
Other values (3)12907
 
2.6%
ValueCountFrequency (%)
C48589
47.4%
S43785
42.7%
F7505
 
7.3%
D2701
 
2.6%
ValueCountFrequency (%)
51290
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin607431
92.2%
Common51290
 
7.8%

Most frequent character per script

ValueCountFrequency (%)
a115541
19.0%
s104683
17.2%
d71859
11.8%
C48589
8.0%
l48589
8.0%
S43785
 
7.2%
n41084
 
6.8%
r38280
 
6.3%
t38280
 
6.3%
e13010
 
2.1%
Other values (7)43731
 
7.2%
ValueCountFrequency (%)
51290
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII658721
100.0%

Most frequent character per block

ValueCountFrequency (%)
a115541
17.5%
s104683
15.9%
d71859
10.9%
51290
7.8%
C48589
7.4%
l48589
7.4%
S43785
 
6.6%
n41084
 
6.2%
r38280
 
5.8%
t38280
 
5.8%
Other values (8)56741
8.6%

Customer ID
Categorical

HIGH CARDINALITY

Distinct17415
Distinct (%)34.0%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
SV-203651406
 
26
WB-218501404
 
24
AP-109151404
 
23
EM-1396082
 
21
CS-121757
 
20
Other values (17410)
51176 

Length

Max length12
Median length10
Mean length10.30889062
Min length7

Characters and Unicode

Total characters528743
Distinct characters40
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6065 ?
Unique (%)11.8%

Sample

1st rowAB-100151402
2nd rowJR-162107
3rd rowCR-127307
4th rowKM-1637548
5th rowRH-9495111
ValueCountFrequency (%)
SV-20365140626
 
0.1%
WB-21850140424
 
< 0.1%
AP-10915140423
 
< 0.1%
EM-139608221
 
< 0.1%
CS-12175720
 
< 0.1%
JK-160902719
 
< 0.1%
RW-19540140419
 
< 0.1%
YC-218954518
 
< 0.1%
TB-21055140618
 
< 0.1%
RB-193308218
 
< 0.1%
Other values (17405)51084
99.6%
2021-02-20T15:00:32.775431image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sv-20365140626
 
0.1%
wb-21850140424
 
< 0.1%
ap-10915140423
 
< 0.1%
em-139608221
 
< 0.1%
cs-12175720
 
< 0.1%
jk-160902719
 
< 0.1%
rw-19540140419
 
< 0.1%
tb-21055140618
 
< 0.1%
rb-193308218
 
< 0.1%
yc-218954518
 
< 0.1%
Other values (17405)51084
99.6%

Most occurring characters

ValueCountFrequency (%)
180165
15.2%
057252
10.8%
-51290
9.7%
549388
 
9.3%
437214
 
7.0%
232895
 
6.2%
827651
 
5.2%
324211
 
4.6%
622351
 
4.2%
922193
 
4.2%
Other values (30)124133
23.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number374873
70.9%
Uppercase Letter102373
 
19.4%
Dash Punctuation51290
 
9.7%
Lowercase Letter207
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
M9011
 
8.8%
C8835
 
8.6%
S8738
 
8.5%
B8382
 
8.2%
D6582
 
6.4%
J6171
 
6.0%
A5967
 
5.8%
H5218
 
5.1%
P5206
 
5.1%
R4849
 
4.7%
Other values (16)33414
32.6%
ValueCountFrequency (%)
180165
21.4%
057252
15.3%
549388
13.2%
437214
9.9%
232895
8.8%
827651
 
7.4%
324211
 
6.5%
622351
 
6.0%
922193
 
5.9%
721553
 
5.7%
ValueCountFrequency (%)
p81
39.1%
o68
32.9%
l58
28.0%
ValueCountFrequency (%)
-51290
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common426163
80.6%
Latin102580
 
19.4%

Most frequent character per script

ValueCountFrequency (%)
M9011
 
8.8%
C8835
 
8.6%
S8738
 
8.5%
B8382
 
8.2%
D6582
 
6.4%
J6171
 
6.0%
A5967
 
5.8%
H5218
 
5.1%
P5206
 
5.1%
R4849
 
4.7%
Other values (19)33621
32.8%
ValueCountFrequency (%)
180165
18.8%
057252
13.4%
-51290
12.0%
549388
11.6%
437214
8.7%
232895
7.7%
827651
 
6.5%
324211
 
5.7%
622351
 
5.2%
922193
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII528743
100.0%

Most frequent character per block

ValueCountFrequency (%)
180165
15.2%
057252
10.8%
-51290
9.7%
549388
 
9.3%
437214
 
7.0%
232895
 
6.2%
827651
 
5.2%
324211
 
4.6%
622351
 
4.2%
922193
 
4.2%
Other values (30)124133
23.5%

Customer Name
Categorical

HIGH CARDINALITY

Distinct796
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Muhammed Yedwab
 
108
Steven Ward
 
106
Patrick O'Brill
 
102
Bill Eplett
 
102
Gary Hwang
 
102
Other values (791)
50770 

Length

Max length22
Median length13
Mean length12.94499903
Min length7

Characters and Unicode

Total characters663949
Distinct characters57
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAaron Bergman
2nd rowJustin Ritter
3rd rowCraig Reiter
4th rowKatherine Murray
5th rowRick Hansen
ValueCountFrequency (%)
Muhammed Yedwab108
 
0.2%
Steven Ward106
 
0.2%
Patrick O'Brill102
 
0.2%
Bill Eplett102
 
0.2%
Gary Hwang102
 
0.2%
Harry Greene101
 
0.2%
Eric Murdock100
 
0.2%
Art Ferguson98
 
0.2%
Brosina Hoffman97
 
0.2%
Chloris Kastensmidt96
 
0.2%
Other values (786)50278
98.0%
2021-02-20T15:00:33.004847image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
michael655
 
0.6%
john522
 
0.5%
paul438
 
0.4%
patrick437
 
0.4%
tom430
 
0.4%
stewart426
 
0.4%
anthony424
 
0.4%
frank422
 
0.4%
alan402
 
0.4%
bill402
 
0.4%
Other values (903)98317
95.6%

Most occurring characters

ValueCountFrequency (%)
a61448
 
9.3%
e60921
 
9.2%
n51794
 
7.8%
51585
 
7.8%
r48359
 
7.3%
i40349
 
6.1%
l34222
 
5.2%
o30786
 
4.6%
t27197
 
4.1%
s23187
 
3.5%
Other values (47)234101
35.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter506201
76.2%
Uppercase Letter105210
 
15.8%
Space Separator51585
 
7.8%
Other Punctuation728
 
0.1%
Dash Punctuation225
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a61448
12.1%
e60921
12.0%
n51794
10.2%
r48359
9.6%
i40349
 
8.0%
l34222
 
6.8%
o30786
 
6.1%
t27197
 
5.4%
s23187
 
4.6%
h19661
 
3.9%
Other values (18)108277
21.4%
ValueCountFrequency (%)
C9419
 
9.0%
M9185
 
8.7%
S8731
 
8.3%
B8677
 
8.2%
D6780
 
6.4%
A6298
 
6.0%
J6171
 
5.9%
H5434
 
5.2%
P5206
 
4.9%
R4977
 
4.7%
Other values (16)34332
32.6%
ValueCountFrequency (%)
51585
100.0%
ValueCountFrequency (%)
'728
100.0%
ValueCountFrequency (%)
-225
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin611411
92.1%
Common52538
 
7.9%

Most frequent character per script

ValueCountFrequency (%)
a61448
 
10.1%
e60921
 
10.0%
n51794
 
8.5%
r48359
 
7.9%
i40349
 
6.6%
l34222
 
5.6%
o30786
 
5.0%
t27197
 
4.4%
s23187
 
3.8%
h19661
 
3.2%
Other values (44)213487
34.9%
ValueCountFrequency (%)
51585
98.2%
'728
 
1.4%
-225
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII663533
99.9%
None416
 
0.1%

Most frequent character per block

ValueCountFrequency (%)
a61448
 
9.3%
e60921
 
9.2%
n51794
 
7.8%
51585
 
7.8%
r48359
 
7.3%
i40349
 
6.1%
l34222
 
5.2%
o30786
 
4.6%
t27197
 
4.1%
s23187
 
3.5%
Other values (44)233685
35.2%
ValueCountFrequency (%)
ö293
70.4%
ä76
 
18.3%
ü47
 
11.3%

Segment
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Consumer
26518 
Corporate
15429 
Home Office
9343 

Length

Max length11
Median length8
Mean length8.847299669
Min length8

Characters and Unicode

Total characters453778
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowConsumer
2nd rowCorporate
3rd rowConsumer
4th rowHome Office
5th rowConsumer
ValueCountFrequency (%)
Consumer26518
51.7%
Corporate15429
30.1%
Home Office9343
 
18.2%
2021-02-20T15:00:33.206306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-20T15:00:33.273127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
consumer26518
43.7%
corporate15429
25.4%
office9343
 
15.4%
home9343
 
15.4%

Most occurring characters

ValueCountFrequency (%)
o66719
14.7%
e60633
13.4%
r57376
12.6%
C41947
9.2%
m35861
7.9%
n26518
 
5.8%
s26518
 
5.8%
u26518
 
5.8%
f18686
 
4.1%
p15429
 
3.4%
Other values (7)77573
17.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter383802
84.6%
Uppercase Letter60633
 
13.4%
Space Separator9343
 
2.1%

Most frequent character per category

ValueCountFrequency (%)
o66719
17.4%
e60633
15.8%
r57376
14.9%
m35861
9.3%
n26518
 
6.9%
s26518
 
6.9%
u26518
 
6.9%
f18686
 
4.9%
p15429
 
4.0%
a15429
 
4.0%
Other values (3)34115
8.9%
ValueCountFrequency (%)
C41947
69.2%
H9343
 
15.4%
O9343
 
15.4%
ValueCountFrequency (%)
9343
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin444435
97.9%
Common9343
 
2.1%

Most frequent character per script

ValueCountFrequency (%)
o66719
15.0%
e60633
13.6%
r57376
12.9%
C41947
9.4%
m35861
8.1%
n26518
 
6.0%
s26518
 
6.0%
u26518
 
6.0%
f18686
 
4.2%
p15429
 
3.5%
Other values (6)68230
15.4%
ValueCountFrequency (%)
9343
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII453778
100.0%

Most frequent character per block

ValueCountFrequency (%)
o66719
14.7%
e60633
13.4%
r57376
12.6%
C41947
9.2%
m35861
7.9%
n26518
 
5.8%
s26518
 
5.8%
u26518
 
5.8%
f18686
 
4.1%
p15429
 
3.4%
Other values (7)77573
17.1%

Postal Code
Real number (ℝ≥0)

MISSING

Distinct631
Distinct (%)6.3%
Missing41296
Missing (%)80.5%
Infinite0
Infinite (%)0.0%
Mean55190.37943
Minimum1040
Maximum99301
Zeros0
Zeros (%)0.0%
Memory size400.7 KiB
2021-02-20T15:00:33.364882image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1040
5-th percentile10009
Q123223
median56430.5
Q390008
95-th percentile98006
Maximum99301
Range98261
Interquartile range (IQR)66785

Descriptive statistics

Standard deviation32063.69335
Coefficient of variation (CV)0.5809652639
Kurtosis-1.493020228
Mean55190.37943
Median Absolute Deviation (MAD)33573.5
Skewness-0.1285255164
Sum551572652
Variance1028080431
MonotocityNot monotonic
2021-02-20T15:00:33.476555image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10035263
 
0.5%
10024230
 
0.4%
10009229
 
0.4%
94122203
 
0.4%
10011193
 
0.4%
94110166
 
0.3%
98105165
 
0.3%
19134160
 
0.3%
98103151
 
0.3%
90049151
 
0.3%
Other values (621)8083
 
15.8%
(Missing)41296
80.5%
ValueCountFrequency (%)
10401
 
< 0.1%
14536
 
< 0.1%
17522
 
< 0.1%
18104
 
< 0.1%
184133
0.1%
ValueCountFrequency (%)
993016
< 0.1%
992077
< 0.1%
986615
< 0.1%
986323
< 0.1%
985025
< 0.1%

City
Categorical

HIGH CARDINALITY

Distinct3650
Distinct (%)7.1%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
New York City
 
915
Los Angeles
 
747
Philadelphia
 
537
San Francisco
 
510
Santo Domingo
 
443
Other values (3645)
48138 

Length

Max length35
Median length8
Mean length8.418834081
Min length2

Characters and Unicode

Total characters431802
Distinct characters77
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique491 ?
Unique (%)1.0%

Sample

1st rowOklahoma City
2nd rowWollongong
3rd rowBrisbane
4th rowBerlin
5th rowDakar
ValueCountFrequency (%)
New York City915
 
1.8%
Los Angeles747
 
1.5%
Philadelphia537
 
1.0%
San Francisco510
 
1.0%
Santo Domingo443
 
0.9%
Manila432
 
0.8%
Seattle428
 
0.8%
Houston377
 
0.7%
Tegucigalpa362
 
0.7%
Jakarta337
 
0.7%
Other values (3640)46202
90.1%
2021-02-20T15:00:33.722924image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city1787
 
2.8%
san1672
 
2.6%
new958
 
1.5%
york950
 
1.5%
los874
 
1.4%
angeles751
 
1.2%
de598
 
0.9%
francisco557
 
0.9%
philadelphia537
 
0.8%
santo465
 
0.7%
Other values (3820)54682
85.7%

Most occurring characters

ValueCountFrequency (%)
a55167
 
12.8%
n32505
 
7.5%
e31858
 
7.4%
o30454
 
7.1%
i27007
 
6.3%
r23832
 
5.5%
l21397
 
5.0%
s16138
 
3.7%
t15961
 
3.7%
u15623
 
3.6%
Other values (67)161860
37.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter354192
82.0%
Uppercase Letter63398
 
14.7%
Space Separator12541
 
2.9%
Dash Punctuation1312
 
0.3%
Other Punctuation349
 
0.1%
Open Punctuation4
 
< 0.1%
Close Punctuation4
 
< 0.1%
Final Punctuation2
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a55167
15.6%
n32505
 
9.2%
e31858
 
9.0%
o30454
 
8.6%
i27007
 
7.6%
r23832
 
6.7%
l21397
 
6.0%
s16138
 
4.6%
t15961
 
4.5%
u15623
 
4.4%
Other values (33)84250
23.8%
ValueCountFrequency (%)
S7458
11.8%
C7196
11.4%
M6049
 
9.5%
B4728
 
7.5%
L4203
 
6.6%
A4137
 
6.5%
P4013
 
6.3%
T2649
 
4.2%
D2553
 
4.0%
N2456
 
3.9%
Other values (17)17956
28.3%
ValueCountFrequency (%)
'341
97.7%
.8
 
2.3%
ValueCountFrequency (%)
12541
100.0%
ValueCountFrequency (%)
-1312
100.0%
ValueCountFrequency (%)
(4
100.0%
ValueCountFrequency (%)
)4
100.0%
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin417590
96.7%
Common14212
 
3.3%

Most frequent character per script

ValueCountFrequency (%)
a55167
 
13.2%
n32505
 
7.8%
e31858
 
7.6%
o30454
 
7.3%
i27007
 
6.5%
r23832
 
5.7%
l21397
 
5.1%
s16138
 
3.9%
t15961
 
3.8%
u15623
 
3.7%
Other values (60)147648
35.4%
ValueCountFrequency (%)
12541
88.2%
-1312
 
9.2%
'341
 
2.4%
.8
 
0.1%
(4
 
< 0.1%
)4
 
< 0.1%
2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII429375
99.4%
None2425
 
0.6%
Punctuation2
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
a55167
 
12.8%
n32505
 
7.6%
e31858
 
7.4%
o30454
 
7.1%
i27007
 
6.3%
r23832
 
5.6%
l21397
 
5.0%
s16138
 
3.8%
t15961
 
3.7%
u15623
 
3.6%
Other values (48)159433
37.1%
ValueCountFrequency (%)
á643
26.5%
í507
20.9%
ó410
16.9%
é290
12.0%
ã261
10.8%
ú89
 
3.7%
ü55
 
2.3%
ç52
 
2.1%
ñ34
 
1.4%
Á32
 
1.3%
Other values (8)52
 
2.1%
ValueCountFrequency (%)
2
100.0%

State
Categorical

HIGH CARDINALITY

Distinct1102
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
California
 
2001
England
 
1499
New York
 
1128
Texas
 
985
Ile-de-France
 
981
Other values (1097)
44696 

Length

Max length36
Median length8
Mean length9.997387405
Min length3

Characters and Unicode

Total characters512766
Distinct characters85
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)0.1%

Sample

1st rowOklahoma
2nd rowNew South Wales
3rd rowQueensland
4th rowBerlin
5th rowDakar
ValueCountFrequency (%)
California2001
 
3.9%
England1499
 
2.9%
New York1128
 
2.2%
Texas985
 
1.9%
Ile-de-France981
 
1.9%
New South Wales781
 
1.5%
North Rhine-Westphalia719
 
1.4%
Queensland717
 
1.4%
San Salvador615
 
1.2%
Pennsylvania587
 
1.1%
Other values (1092)41277
80.5%
2021-02-20T15:00:34.001182image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
california2125
 
3.1%
new2103
 
3.1%
england1499
 
2.2%
south1201
 
1.8%
north1145
 
1.7%
york1128
 
1.7%
texas985
 
1.4%
ile-de-france981
 
1.4%
wales817
 
1.2%
capital784
 
1.1%
Other values (1200)55441
81.3%

Most occurring characters

ValueCountFrequency (%)
a74178
 
14.5%
n40975
 
8.0%
i34897
 
6.8%
e33043
 
6.4%
o29586
 
5.8%
r29423
 
5.7%
l24353
 
4.7%
t21702
 
4.2%
s20399
 
4.0%
16919
 
3.3%
Other values (75)187291
36.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter412939
80.5%
Uppercase Letter73884
 
14.4%
Space Separator16919
 
3.3%
Dash Punctuation8007
 
1.6%
Other Punctuation811
 
0.2%
Open Punctuation103
 
< 0.1%
Close Punctuation103
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a74178
18.0%
n40975
9.9%
i34897
 
8.5%
e33043
 
8.0%
o29586
 
7.2%
r29423
 
7.1%
l24353
 
5.9%
t21702
 
5.3%
s20399
 
4.9%
u15519
 
3.8%
Other values (40)88864
21.5%
ValueCountFrequency (%)
C7952
 
10.8%
S6864
 
9.3%
A6031
 
8.2%
N5045
 
6.8%
P4439
 
6.0%
M4195
 
5.7%
B3613
 
4.9%
T3083
 
4.2%
W3017
 
4.1%
L2950
 
4.0%
Other values (19)26695
36.1%
ValueCountFrequency (%)
'764
94.2%
.47
 
5.8%
ValueCountFrequency (%)
16919
100.0%
ValueCountFrequency (%)
-8007
100.0%
ValueCountFrequency (%)
(103
100.0%
ValueCountFrequency (%)
)103
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin486823
94.9%
Common25943
 
5.1%

Most frequent character per script

ValueCountFrequency (%)
a74178
15.2%
n40975
 
8.4%
i34897
 
7.2%
e33043
 
6.8%
o29586
 
6.1%
r29423
 
6.0%
l24353
 
5.0%
t21702
 
4.5%
s20399
 
4.2%
u15519
 
3.2%
Other values (69)162748
33.4%
ValueCountFrequency (%)
16919
65.2%
-8007
30.9%
'764
 
2.9%
(103
 
0.4%
)103
 
0.4%
.47
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII507931
99.1%
None4713
 
0.9%
Latin Ext Additional122
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
a74178
14.6%
n40975
 
8.1%
i34897
 
6.9%
e33043
 
6.5%
o29586
 
5.8%
r29423
 
5.8%
l24353
 
4.8%
t21702
 
4.3%
s20399
 
4.0%
16919
 
3.3%
Other values (48)182456
35.9%
ValueCountFrequency (%)
é1142
24.2%
á873
18.5%
í712
15.1%
ô672
14.3%
ã472
10.0%
ó291
 
6.2%
ü260
 
5.5%
è63
 
1.3%
à58
 
1.2%
Á30
 
0.6%
Other values (13)140
 
3.0%
ValueCountFrequency (%)
48
39.3%
48
39.3%
13
 
10.7%
13
 
10.7%

Country
Categorical

HIGH CARDINALITY

Distinct165
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
United States
9994 
Australia
 
2837
France
 
2827
Mexico
 
2635
Germany
 
2063
Other values (160)
30934 

Length

Max length32
Median length8
Mean length8.837492689
Min length4

Characters and Unicode

Total characters453275
Distinct characters56
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowUnited States
2nd rowAustralia
3rd rowAustralia
4th rowGermany
5th rowSenegal
ValueCountFrequency (%)
United States9994
19.5%
Australia2837
 
5.5%
France2827
 
5.5%
Mexico2635
 
5.1%
Germany2063
 
4.0%
China1880
 
3.7%
United Kingdom1633
 
3.2%
Brazil1593
 
3.1%
India1554
 
3.0%
Indonesia1390
 
2.7%
Other values (155)22884
44.6%
2021-02-20T15:00:34.251510image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united11641
 
17.1%
states9994
 
14.7%
australia2837
 
4.2%
france2827
 
4.2%
mexico2635
 
3.9%
germany2063
 
3.0%
china1880
 
2.8%
kingdom1633
 
2.4%
brazil1593
 
2.3%
india1554
 
2.3%
Other values (176)29447
43.2%

Most occurring characters

ValueCountFrequency (%)
a55156
 
12.2%
e43178
 
9.5%
i40565
 
8.9%
t40484
 
8.9%
n36892
 
8.1%
d21296
 
4.7%
r20396
 
4.5%
s18378
 
4.1%
16814
 
3.7%
o14464
 
3.2%
Other values (46)145652
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter368777
81.4%
Uppercase Letter67296
 
14.8%
Space Separator16814
 
3.7%
Open Punctuation136
 
< 0.1%
Close Punctuation136
 
< 0.1%
Other Punctuation107
 
< 0.1%
Dash Punctuation9
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a55156
15.0%
e43178
11.7%
i40565
11.0%
t40484
11.0%
n36892
10.0%
d21296
 
5.8%
r20396
 
5.5%
s18378
 
5.0%
o14464
 
3.9%
l13487
 
3.7%
Other values (16)64481
17.5%
ValueCountFrequency (%)
S13320
19.8%
U12131
18.0%
I5355
8.0%
A4815
 
7.2%
C4259
 
6.3%
M3710
 
5.5%
F2899
 
4.3%
G2805
 
4.2%
N2745
 
4.1%
B2344
 
3.5%
Other values (15)12913
19.2%
ValueCountFrequency (%)
16814
100.0%
ValueCountFrequency (%)
(136
100.0%
ValueCountFrequency (%)
)136
100.0%
ValueCountFrequency (%)
'107
100.0%
ValueCountFrequency (%)
-9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin436073
96.2%
Common17202
 
3.8%

Most frequent character per script

ValueCountFrequency (%)
a55156
12.6%
e43178
 
9.9%
i40565
 
9.3%
t40484
 
9.3%
n36892
 
8.5%
d21296
 
4.9%
r20396
 
4.7%
s18378
 
4.2%
o14464
 
3.3%
l13487
 
3.1%
Other values (41)131777
30.2%
ValueCountFrequency (%)
16814
97.7%
(136
 
0.8%
)136
 
0.8%
'107
 
0.6%
-9
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII453275
100.0%

Most frequent character per block

ValueCountFrequency (%)
a55156
 
12.2%
e43178
 
9.5%
i40565
 
8.9%
t40484
 
8.9%
n36892
 
8.1%
d21296
 
4.7%
r20396
 
4.5%
s18378
 
4.1%
16814
 
3.7%
o14464
 
3.2%
Other values (46)145652
32.1%

Region
Categorical

HIGH CORRELATION

Distinct23
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Western Europe
5883 
Central America
5616 
Oceania
3487 
Western US
3203 
Southeastern Asia
3129 
Other values (18)
29972 

Length

Max length17
Median length13
Mean length12.58159485
Min length6

Characters and Unicode

Total characters645310
Distinct characters26
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCentral US
2nd rowOceania
3rd rowOceania
4th rowWestern Europe
5th rowWestern Africa
ValueCountFrequency (%)
Western Europe5883
 
11.5%
Central America5616
 
10.9%
Oceania3487
 
6.8%
Western US3203
 
6.2%
Southeastern Asia3129
 
6.1%
South America2988
 
5.8%
Eastern US2848
 
5.6%
Southern Asia2655
 
5.2%
Western Asia2440
 
4.8%
Eastern Asia2374
 
4.6%
Other values (13)16667
32.5%
2021-02-20T15:00:34.603571image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
western12986
13.4%
europe11729
12.1%
asia10815
11.1%
us9994
10.3%
central8799
9.1%
america8604
8.9%
eastern7479
7.7%
southern6866
7.1%
africa4587
 
4.7%
oceania3487
 
3.6%
Other values (6)11673
12.0%

Most occurring characters

ValueCountFrequency (%)
e83088
12.9%
r71555
11.1%
a54919
 
8.5%
t48858
 
7.6%
n47024
 
7.3%
45729
 
7.1%
s34409
 
5.3%
i29183
 
4.5%
o28194
 
4.4%
u24712
 
3.8%
Other values (16)177639
27.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter492568
76.3%
Uppercase Letter107013
 
16.6%
Space Separator45729
 
7.1%

Most frequent character per category

ValueCountFrequency (%)
e83088
16.9%
r71555
14.5%
a54919
11.1%
t48858
9.9%
n47024
9.5%
s34409
7.0%
i29183
 
5.9%
o28194
 
5.7%
u24712
 
5.0%
c16678
 
3.4%
Other values (7)53948
11.0%
ValueCountFrequency (%)
A24006
22.4%
S22977
21.5%
E19208
17.9%
W12986
12.1%
C10873
10.2%
U9994
9.3%
O3487
 
3.3%
N3482
 
3.3%
ValueCountFrequency (%)
45729
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin599581
92.9%
Common45729
 
7.1%

Most frequent character per script

ValueCountFrequency (%)
e83088
13.9%
r71555
11.9%
a54919
 
9.2%
t48858
 
8.1%
n47024
 
7.8%
s34409
 
5.7%
i29183
 
4.9%
o28194
 
4.7%
u24712
 
4.1%
A24006
 
4.0%
Other values (15)153633
25.6%
ValueCountFrequency (%)
45729
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII645310
100.0%

Most frequent character per block

ValueCountFrequency (%)
e83088
12.9%
r71555
11.1%
a54919
 
8.5%
t48858
 
7.6%
n47024
 
7.3%
45729
 
7.1%
s34409
 
5.3%
i29183
 
4.5%
o28194
 
4.4%
u24712
 
3.8%
Other values (16)177639
27.5%

Market
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Asia Pacific
14302 
Europe
11729 
USCA
10378 
LATAM
10294 
Africa
4587 

Length

Max length12
Median length6
Mean length7.067693508
Min length4

Characters and Unicode

Total characters362502
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUSCA
2nd rowAsia Pacific
3rd rowAsia Pacific
4th rowEurope
5th rowAfrica
ValueCountFrequency (%)
Asia Pacific14302
27.9%
Europe11729
22.9%
USCA10378
20.2%
LATAM10294
20.1%
Africa4587
 
8.9%
2021-02-20T15:00:34.811021image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-20T15:00:34.889806image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
pacific14302
21.8%
asia14302
21.8%
europe11729
17.9%
usca10378
15.8%
latam10294
15.7%
africa4587
 
7.0%

Most occurring characters

ValueCountFrequency (%)
A49855
13.8%
i47493
13.1%
a33191
 
9.2%
c33191
 
9.2%
f18889
 
5.2%
r16316
 
4.5%
s14302
 
3.9%
14302
 
3.9%
P14302
 
3.9%
E11729
 
3.2%
Other values (10)108932
30.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter210298
58.0%
Uppercase Letter137902
38.0%
Space Separator14302
 
3.9%

Most frequent character per category

ValueCountFrequency (%)
i47493
22.6%
a33191
15.8%
c33191
15.8%
f18889
 
9.0%
r16316
 
7.8%
s14302
 
6.8%
u11729
 
5.6%
o11729
 
5.6%
p11729
 
5.6%
e11729
 
5.6%
ValueCountFrequency (%)
A49855
36.2%
P14302
 
10.4%
E11729
 
8.5%
U10378
 
7.5%
S10378
 
7.5%
C10378
 
7.5%
L10294
 
7.5%
T10294
 
7.5%
M10294
 
7.5%
ValueCountFrequency (%)
14302
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin348200
96.1%
Common14302
 
3.9%

Most frequent character per script

ValueCountFrequency (%)
A49855
14.3%
i47493
13.6%
a33191
 
9.5%
c33191
 
9.5%
f18889
 
5.4%
r16316
 
4.7%
s14302
 
4.1%
P14302
 
4.1%
E11729
 
3.4%
u11729
 
3.4%
Other values (9)97203
27.9%
ValueCountFrequency (%)
14302
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII362502
100.0%

Most frequent character per block

ValueCountFrequency (%)
A49855
13.8%
i47493
13.1%
a33191
 
9.2%
c33191
 
9.2%
f18889
 
5.2%
r16316
 
4.5%
s14302
 
3.9%
14302
 
3.9%
P14302
 
3.9%
E11729
 
3.2%
Other values (10)108932
30.1%

Product ID
Categorical

HIGH CARDINALITY

Distinct3788
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
OFF-FA-6129
 
227
OFF-BI-3737
 
92
OFF-ST-4057
 
90
OFF-ST-5693
 
84
OFF-BI-4828
 
83
Other values (3783)
50714 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters564190
Distinct characters27
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique98 ?
Unique (%)0.2%

Sample

1st rowTEC-PH-5816
2nd rowFUR-CH-5379
3rd rowTEC-PH-5356
4th rowTEC-PH-5267
5th rowTEC-CO-6011
ValueCountFrequency (%)
OFF-FA-6129227
 
0.4%
OFF-BI-373792
 
0.2%
OFF-ST-405790
 
0.2%
OFF-ST-569384
 
0.2%
OFF-BI-482883
 
0.2%
OFF-AR-592380
 
0.2%
OFF-ST-603377
 
0.2%
OFF-BI-291775
 
0.1%
OFF-AR-612075
 
0.1%
OFF-BI-329374
 
0.1%
Other values (3778)50333
98.1%
2021-02-20T15:00:35.118195image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
off-fa-6129227
 
0.4%
off-bi-373792
 
0.2%
off-st-405790
 
0.2%
off-st-569384
 
0.2%
off-bi-482883
 
0.2%
off-ar-592380
 
0.2%
off-st-603377
 
0.2%
off-ar-612075
 
0.1%
off-bi-291775
 
0.1%
off-bi-329374
 
0.1%
Other values (3778)50333
98.1%

Most occurring characters

ValueCountFrequency (%)
-102580
18.2%
F78193
13.9%
O35923
 
6.4%
429191
 
5.2%
328497
 
5.1%
527577
 
4.9%
623305
 
4.1%
A20722
 
3.7%
C18873
 
3.3%
218188
 
3.2%
Other values (17)181141
32.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter256450
45.5%
Decimal Number205160
36.4%
Dash Punctuation102580
 
18.2%

Most frequent character per category

ValueCountFrequency (%)
F78193
30.5%
O35923
14.0%
A20722
 
8.1%
C18873
 
7.4%
T16051
 
6.3%
U15421
 
6.0%
R14724
 
5.7%
E12528
 
4.9%
P8591
 
3.3%
B8557
 
3.3%
Other values (6)26867
 
10.5%
ValueCountFrequency (%)
429191
14.2%
328497
13.9%
527577
13.4%
623305
11.4%
218188
8.9%
916608
8.1%
016205
7.9%
115724
7.7%
815303
7.5%
714562
7.1%
ValueCountFrequency (%)
-102580
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common307740
54.5%
Latin256450
45.5%

Most frequent character per script

ValueCountFrequency (%)
F78193
30.5%
O35923
14.0%
A20722
 
8.1%
C18873
 
7.4%
T16051
 
6.3%
U15421
 
6.0%
R14724
 
5.7%
E12528
 
4.9%
P8591
 
3.3%
B8557
 
3.3%
Other values (6)26867
 
10.5%
ValueCountFrequency (%)
-102580
33.3%
429191
 
9.5%
328497
 
9.3%
527577
 
9.0%
623305
 
7.6%
218188
 
5.9%
916608
 
5.4%
016205
 
5.3%
115724
 
5.1%
815303
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII564190
100.0%

Most frequent character per block

ValueCountFrequency (%)
-102580
18.2%
F78193
13.9%
O35923
 
6.4%
429191
 
5.2%
328497
 
5.1%
527577
 
4.9%
623305
 
4.1%
A20722
 
3.7%
C18873
 
3.3%
218188
 
3.2%
Other values (17)181141
32.1%

Category
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Office Supplies
31289 
Technology
10141 
Furniture
9860 

Length

Max length15
Median length15
Mean length12.85796452
Min length9

Characters and Unicode

Total characters659485
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTechnology
2nd rowFurniture
3rd rowTechnology
4th rowTechnology
5th rowTechnology
ValueCountFrequency (%)
Office Supplies31289
61.0%
Technology10141
 
19.8%
Furniture9860
 
19.2%
2021-02-20T15:00:35.310678image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-20T15:00:35.378471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
office31289
37.9%
supplies31289
37.9%
technology10141
 
12.3%
furniture9860
 
11.9%

Most occurring characters

ValueCountFrequency (%)
e82579
12.5%
i72438
11.0%
f62578
9.5%
p62578
9.5%
u51009
 
7.7%
c41430
 
6.3%
l41430
 
6.3%
O31289
 
4.7%
31289
 
4.7%
S31289
 
4.7%
Other values (10)151576
23.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter545617
82.7%
Uppercase Letter82579
 
12.5%
Space Separator31289
 
4.7%

Most frequent character per category

ValueCountFrequency (%)
e82579
15.1%
i72438
13.3%
f62578
11.5%
p62578
11.5%
u51009
9.3%
c41430
7.6%
l41430
7.6%
s31289
 
5.7%
o20282
 
3.7%
n20001
 
3.7%
Other values (5)60003
11.0%
ValueCountFrequency (%)
O31289
37.9%
S31289
37.9%
T10141
 
12.3%
F9860
 
11.9%
ValueCountFrequency (%)
31289
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin628196
95.3%
Common31289
 
4.7%

Most frequent character per script

ValueCountFrequency (%)
e82579
13.1%
i72438
11.5%
f62578
10.0%
p62578
10.0%
u51009
8.1%
c41430
 
6.6%
l41430
 
6.6%
O31289
 
5.0%
S31289
 
5.0%
s31289
 
5.0%
Other values (9)120287
19.1%
ValueCountFrequency (%)
31289
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII659485
100.0%

Most frequent character per block

ValueCountFrequency (%)
e82579
12.5%
i72438
11.0%
f62578
9.5%
p62578
9.5%
u51009
 
7.7%
c41430
 
6.3%
l41430
 
6.3%
O31289
 
4.7%
31289
 
4.7%
S31289
 
4.7%
Other values (10)151576
23.0%

Sub-Category
Categorical

HIGH CORRELATION

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Binders
6146 
Storage
5049 
Art
4864 
Paper
3492 
Chairs
3434 
Other values (12)
28305 

Length

Max length11
Median length7
Mean length7.236693313
Min length3

Characters and Unicode

Total characters371170
Distinct characters28
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPhones
2nd rowChairs
3rd rowPhones
4th rowPhones
5th rowCopiers
ValueCountFrequency (%)
Binders6146
12.0%
Storage5049
 
9.8%
Art4864
 
9.5%
Paper3492
 
6.8%
Chairs3434
 
6.7%
Phones3357
 
6.5%
Furnishings3154
 
6.1%
Accessories3075
 
6.0%
Labels2601
 
5.1%
Fasteners2601
 
5.1%
Other values (7)13517
26.4%
2021-02-20T15:00:35.586913image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
binders6146
12.0%
storage5049
 
9.8%
art4864
 
9.5%
paper3492
 
6.8%
chairs3434
 
6.7%
phones3357
 
6.5%
furnishings3154
 
6.1%
accessories3075
 
6.0%
fasteners2601
 
5.1%
labels2601
 
5.1%
Other values (7)13517
26.4%

Most occurring characters

ValueCountFrequency (%)
s52201
14.1%
e47901
12.9%
r34038
 
9.2%
i26821
 
7.2%
n24027
 
6.5%
a23677
 
6.4%
o20913
 
5.6%
p16400
 
4.4%
t12514
 
3.4%
c11789
 
3.2%
Other values (18)100889
27.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter319880
86.2%
Uppercase Letter51290
 
13.8%

Most frequent character per category

ValueCountFrequency (%)
s52201
16.3%
e47901
15.0%
r34038
10.6%
i26821
8.4%
n24027
7.5%
a23677
7.4%
o20913
6.5%
p16400
 
5.1%
t12514
 
3.9%
c11789
 
3.7%
Other values (8)49599
15.5%
ValueCountFrequency (%)
A9681
18.9%
B8557
16.7%
S7456
14.5%
P6849
13.4%
F5755
11.2%
C5657
11.0%
L2601
 
5.1%
E2387
 
4.7%
M1486
 
2.9%
T861
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Latin371170
100.0%

Most frequent character per script

ValueCountFrequency (%)
s52201
14.1%
e47901
12.9%
r34038
 
9.2%
i26821
 
7.2%
n24027
 
6.5%
a23677
 
6.4%
o20913
 
5.6%
p16400
 
4.4%
t12514
 
3.4%
c11789
 
3.2%
Other values (18)100889
27.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII371170
100.0%

Most frequent character per block

ValueCountFrequency (%)
s52201
14.1%
e47901
12.9%
r34038
 
9.2%
i26821
 
7.2%
n24027
 
6.5%
a23677
 
6.4%
o20913
 
5.6%
p16400
 
4.4%
t12514
 
3.4%
c11789
 
3.2%
Other values (18)100889
27.2%

Product Name
Categorical

HIGH CARDINALITY

Distinct3788
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Staples
 
227
Cardinal Index Tab, Clear
 
92
Eldon File Cart, Single Width
 
90
Rogers File Cart, Single Width
 
84
Ibico Index Tab, Clear
 
83
Other values (3783)
50714 

Length

Max length127
Median length29
Mean length30.85693118
Min length5

Characters and Unicode

Total characters1582652
Distinct characters85
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique98 ?
Unique (%)0.2%

Sample

1st rowSamsung Convoy 3
2nd rowNovimex Executive Leather Armchair, Black
3rd rowNokia Smart Phone, with Caller ID
4th rowMotorola Smart Phone, Cordless
5th rowSharp Wireless Fax, High-Speed
ValueCountFrequency (%)
Staples227
 
0.4%
Cardinal Index Tab, Clear92
 
0.2%
Eldon File Cart, Single Width90
 
0.2%
Rogers File Cart, Single Width84
 
0.2%
Ibico Index Tab, Clear83
 
0.2%
Sanford Pencil Sharpener, Water Color80
 
0.2%
Smead File Cart, Single Width77
 
0.2%
Stanley Pencil Sharpener, Water Color75
 
0.1%
Acco Index Tab, Clear75
 
0.1%
Avery Index Tab, Clear74
 
0.1%
Other values (3778)50333
98.1%
2021-02-20T15:00:35.841260image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
labels2385
 
1.0%
recycled2291
 
1.0%
color2187
 
0.9%
with2177
 
0.9%
set2106
 
0.9%
blue2092
 
0.9%
durable2072
 
0.9%
black2055
 
0.9%
avery1920
 
0.8%
clear1893
 
0.8%
Other values (2826)210320
90.9%

Most occurring characters

ValueCountFrequency (%)
179838
 
11.4%
e154618
 
9.8%
a94421
 
6.0%
r91563
 
5.8%
o88370
 
5.6%
l79902
 
5.0%
i79392
 
5.0%
n68089
 
4.3%
t62491
 
3.9%
s60638
 
3.8%
Other values (75)623330
39.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1084788
68.5%
Uppercase Letter235084
 
14.9%
Space Separator180265
 
11.4%
Other Punctuation50142
 
3.2%
Decimal Number25561
 
1.6%
Dash Punctuation6566
 
0.4%
Final Punctuation67
 
< 0.1%
Open Punctuation60
 
< 0.1%
Close Punctuation60
 
< 0.1%
Math Symbol35
 
< 0.1%
Other values (2)24
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e154618
14.3%
a94421
 
8.7%
r91563
 
8.4%
o88370
 
8.1%
l79902
 
7.4%
i79392
 
7.3%
n68089
 
6.3%
t62491
 
5.8%
s60638
 
5.6%
c43284
 
4.0%
Other values (18)262020
24.2%
ValueCountFrequency (%)
S33233
14.1%
C27670
11.8%
B22724
 
9.7%
P18037
 
7.7%
E12943
 
5.5%
A12468
 
5.3%
F12209
 
5.2%
M10589
 
4.5%
R10364
 
4.4%
T10107
 
4.3%
Other values (16)64740
27.5%
ValueCountFrequency (%)
,44416
88.6%
/1561
 
3.1%
&1446
 
2.9%
"1300
 
2.6%
.998
 
2.0%
'257
 
0.5%
#90
 
0.2%
%45
 
0.1%
!9
 
< 0.1%
*9
 
< 0.1%
Other values (2)11
 
< 0.1%
ValueCountFrequency (%)
15377
21.0%
05118
20.0%
53094
12.1%
22756
10.8%
32628
10.3%
81808
 
7.1%
41725
 
6.7%
91234
 
4.8%
6941
 
3.7%
7880
 
3.4%
ValueCountFrequency (%)
179838
99.8%
 427
 
0.2%
ValueCountFrequency (%)
-6566
100.0%
ValueCountFrequency (%)
19
100.0%
ValueCountFrequency (%)
67
100.0%
ValueCountFrequency (%)
¾5
100.0%
ValueCountFrequency (%)
(60
100.0%
ValueCountFrequency (%)
)60
100.0%
ValueCountFrequency (%)
+35
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1319872
83.4%
Common262780
 
16.6%

Most frequent character per script

ValueCountFrequency (%)
e154618
 
11.7%
a94421
 
7.2%
r91563
 
6.9%
o88370
 
6.7%
l79902
 
6.1%
i79392
 
6.0%
n68089
 
5.2%
t62491
 
4.7%
s60638
 
4.6%
c43284
 
3.3%
Other values (44)497104
37.7%
ValueCountFrequency (%)
179838
68.4%
,44416
 
16.9%
-6566
 
2.5%
15377
 
2.0%
05118
 
1.9%
53094
 
1.2%
22756
 
1.0%
32628
 
1.0%
81808
 
0.7%
41725
 
0.7%
Other values (21)9454
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1582117
> 99.9%
None449
 
< 0.1%
Punctuation86
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
179838
 
11.4%
e154618
 
9.8%
a94421
 
6.0%
r91563
 
5.8%
o88370
 
5.6%
l79902
 
5.1%
i79392
 
5.0%
n68089
 
4.3%
t62491
 
3.9%
s60638
 
3.8%
Other values (69)622795
39.4%
ValueCountFrequency (%)
67
77.9%
19
 
22.1%
ValueCountFrequency (%)
 427
95.1%
é14
 
3.1%
¾5
 
1.1%
à3
 
0.7%

Sales
Real number (ℝ≥0)

Distinct27200
Distinct (%)53.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean246.4905812
Minimum0.444
Maximum22638.48
Zeros0
Zeros (%)0.0%
Memory size400.7 KiB
2021-02-20T15:00:35.965929image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.444
5-th percentile8.8
Q130.758625
median85.053
Q3251.0532
95-th percentile1015.95564
Maximum22638.48
Range22638.036
Interquartile range (IQR)220.294575

Descriptive statistics

Standard deviation487.5653605
Coefficient of variation (CV)1.978028362
Kurtosis176.7311999
Mean246.4905812
Median Absolute Deviation (MAD)67.0062
Skewness8.138080021
Sum12642501.91
Variance237719.9808
MonotocityNot monotonic
2021-02-20T15:00:36.068626image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12.9661
 
0.1%
15.55241
 
0.1%
19.4441
 
0.1%
10.36837
 
0.1%
25.9236
 
0.1%
32.434
 
0.1%
2433
 
0.1%
17.5231
 
0.1%
27.9631
 
0.1%
45.3625
 
< 0.1%
Other values (27190)50920
99.3%
ValueCountFrequency (%)
0.4441
< 0.1%
0.5561
< 0.1%
0.8361
< 0.1%
0.8521
< 0.1%
0.8761
< 0.1%
ValueCountFrequency (%)
22638.481
< 0.1%
17499.951
< 0.1%
13999.961
< 0.1%
11199.9681
< 0.1%
10499.971
< 0.1%

Quantity
Real number (ℝ≥0)

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.476545136
Minimum1
Maximum14
Zeros0
Zeros (%)0.0%
Memory size400.7 KiB
2021-02-20T15:00:36.157417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum14
Range13
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.278766314
Coefficient of variation (CV)0.6554686406
Kurtosis2.27588873
Mean3.476545136
Median Absolute Deviation (MAD)1
Skewness1.360367731
Sum178312
Variance5.192775913
MonotocityNot monotonic
2021-02-20T15:00:36.247149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
212748
24.9%
39682
18.9%
18963
17.5%
46385
12.4%
54882
 
9.5%
63020
 
5.9%
72385
 
4.7%
81361
 
2.7%
9987
 
1.9%
10276
 
0.5%
Other values (4)601
 
1.2%
ValueCountFrequency (%)
18963
17.5%
212748
24.9%
39682
18.9%
46385
12.4%
54882
 
9.5%
ValueCountFrequency (%)
14186
0.4%
1383
 
0.2%
12176
0.3%
11156
0.3%
10276
0.5%

Discount
Real number (ℝ≥0)

ZEROS

Distinct29
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1429075453
Minimum0
Maximum0.85
Zeros29009
Zeros (%)56.6%
Memory size400.7 KiB
2021-02-20T15:00:36.344888image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.2
95-th percentile0.6
Maximum0.85
Range0.85
Interquartile range (IQR)0.2

Descriptive statistics

Standard deviation0.2122799317
Coefficient of variation (CV)1.485435434
Kurtosis0.7166824085
Mean0.1429075453
Median Absolute Deviation (MAD)0
Skewness1.387774552
Sum7329.728
Variance0.0450627694
MonotocityNot monotonic
2021-02-20T15:00:36.435672image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
029009
56.6%
0.24998
 
9.7%
0.14068
 
7.9%
0.43177
 
6.2%
0.62006
 
3.9%
0.71786
 
3.5%
0.51633
 
3.2%
0.17735
 
1.4%
0.47725
 
1.4%
0.002461
 
0.9%
Other values (19)2692
 
5.2%
ValueCountFrequency (%)
029009
56.6%
0.002461
 
0.9%
0.07150
 
0.3%
0.14068
 
7.9%
0.15459
 
0.9%
ValueCountFrequency (%)
0.852
 
< 0.1%
0.8316
 
0.6%
0.71786
3.5%
0.6517
 
< 0.1%
0.60223
 
< 0.1%

Profit
Real number (ℝ)

ZEROS

Distinct28234
Distinct (%)55.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.61098248
Minimum-6599.978
Maximum8399.976
Zeros668
Zeros (%)1.3%
Memory size400.7 KiB
2021-02-20T15:00:36.546349image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-6599.978
5-th percentile-83.90475
Q10
median9.24
Q336.81
95-th percentile211.5
Maximum8399.976
Range14999.954
Interquartile range (IQR)36.81

Descriptive statistics

Standard deviation174.3409719
Coefficient of variation (CV)6.093498258
Kurtosis291.4110896
Mean28.61098248
Median Absolute Deviation (MAD)15.96
Skewness4.157188533
Sum1467457.291
Variance30394.77448
MonotocityNot monotonic
2021-02-20T15:00:36.654089image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0668
 
1.3%
7.9264
 
0.1%
3.9662
 
0.1%
4.3260
 
0.1%
956
 
0.1%
2.8855
 
0.1%
2.6454
 
0.1%
5.2852
 
0.1%
2.9749
 
0.1%
4.9248
 
0.1%
Other values (28224)50122
97.7%
ValueCountFrequency (%)
-6599.9781
< 0.1%
-4088.3761
< 0.1%
-3839.99041
< 0.1%
-3701.89281
< 0.1%
-3399.981
< 0.1%
ValueCountFrequency (%)
8399.9761
< 0.1%
6719.98081
< 0.1%
5039.98561
< 0.1%
4946.371
< 0.1%
4630.47551
< 0.1%

Shipping Cost
Real number (ℝ≥0)

Distinct16753
Distinct (%)32.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.47856704
Minimum1.002
Maximum933.57
Zeros0
Zeros (%)0.0%
Memory size400.7 KiB
2021-02-20T15:00:36.781719image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1.002
5-th percentile1.32
Q12.61
median7.79
Q324.45
95-th percentile111.4095
Maximum933.57
Range932.568
Interquartile range (IQR)21.84

Descriptive statistics

Standard deviation57.25137324
Coefficient of variation (CV)2.162177929
Kurtosis50.1458447
Mean26.47856704
Median Absolute Deviation (MAD)6.07
Skewness5.872860637
Sum1358085.703
Variance3277.719738
MonotocityNot monotonic
2021-02-20T15:00:36.892451image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.35113
 
0.2%
1.97108
 
0.2%
1.8106
 
0.2%
1.79106
 
0.2%
1.62105
 
0.2%
1.71103
 
0.2%
2.04103
 
0.2%
1.94103
 
0.2%
1.98102
 
0.2%
1.58100
 
0.2%
Other values (16743)50241
98.0%
ValueCountFrequency (%)
1.0021
 
< 0.1%
1.0031
 
< 0.1%
1.016
< 0.1%
1.0191
 
< 0.1%
1.026
< 0.1%
ValueCountFrequency (%)
933.571
< 0.1%
923.631
< 0.1%
915.491
< 0.1%
910.161
< 0.1%
903.041
< 0.1%

Order Priority
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.7 KiB
Medium
29433 
High
15501 
Critical
3932 
Low
 
2424

Length

Max length8
Median length6
Mean length5.4070969
Min length3

Characters and Unicode

Total characters277330
Distinct characters18
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHigh
2nd rowCritical
3rd rowMedium
4th rowMedium
5th rowCritical
ValueCountFrequency (%)
Medium29433
57.4%
High15501
30.2%
Critical3932
 
7.7%
Low2424
 
4.7%
2021-02-20T15:00:37.105885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-20T15:00:37.171703image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
medium29433
57.4%
high15501
30.2%
critical3932
 
7.7%
low2424
 
4.7%

Most occurring characters

ValueCountFrequency (%)
i52798
19.0%
M29433
10.6%
e29433
10.6%
d29433
10.6%
u29433
10.6%
m29433
10.6%
H15501
 
5.6%
g15501
 
5.6%
h15501
 
5.6%
C3932
 
1.4%
Other values (8)26932
9.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter226040
81.5%
Uppercase Letter51290
 
18.5%

Most frequent character per category

ValueCountFrequency (%)
i52798
23.4%
e29433
13.0%
d29433
13.0%
u29433
13.0%
m29433
13.0%
g15501
 
6.9%
h15501
 
6.9%
r3932
 
1.7%
t3932
 
1.7%
c3932
 
1.7%
Other values (4)12712
 
5.6%
ValueCountFrequency (%)
M29433
57.4%
H15501
30.2%
C3932
 
7.7%
L2424
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Latin277330
100.0%

Most frequent character per script

ValueCountFrequency (%)
i52798
19.0%
M29433
10.6%
e29433
10.6%
d29433
10.6%
u29433
10.6%
m29433
10.6%
H15501
 
5.6%
g15501
 
5.6%
h15501
 
5.6%
C3932
 
1.4%
Other values (8)26932
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII277330
100.0%

Most frequent character per block

ValueCountFrequency (%)
i52798
19.0%
M29433
10.6%
e29433
10.6%
d29433
10.6%
u29433
10.6%
m29433
10.6%
H15501
 
5.6%
g15501
 
5.6%
h15501
 
5.6%
C3932
 
1.4%
Other values (8)26932
9.7%

Interactions

2021-02-20T15:00:24.498683image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:24.602378image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:24.709120image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:24.813413image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:24.925633image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.031344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.147012image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.244535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.346263image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.447622image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.547355image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.650080image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.761781image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.872458image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:25.981167image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:26.078905image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:26.194623image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:26.297349image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:26.403066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:26.505791image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:26.617657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:26.719001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:26.825071image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:27.016860image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:27.138534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:27.255222image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:27.365713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:27.483427image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:27.593132image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:27.691373image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:27.814016image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:27.916954image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:28.033613image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:28.133219image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:28.245335image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:28.349058image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:28.459281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:28.577456image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:28.692149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:28.809613image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:28.930870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:29.056161image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:29.171037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:29.268770image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:29.371502image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:29.473549image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:29.578269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:29.678715image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:29.796428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-20T15:00:37.251498image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-20T15:00:37.387100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-20T15:00:37.521767image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-20T15:00:37.668375image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-02-20T15:00:37.852882image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-20T15:00:30.092636image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:30.691035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-20T15:00:30.981230image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

Row IDOrder IDOrder DateShip DateShip ModeCustomer IDCustomer NameSegmentPostal CodeCityStateCountryRegionMarketProduct IDCategorySub-CategoryProduct NameSalesQuantityDiscountProfitShipping CostOrder Priority
040098CA-2014-AB10015140-419542014-11-112014-11-13First ClassAB-100151402Aaron BergmanConsumer73120.0Oklahoma CityOklahomaUnited StatesCentral USUSCATEC-PH-5816TechnologyPhonesSamsung Convoy 3221.98020.062.154440.77High
126341IN-2014-JR162107-416752014-02-052014-02-07Second ClassJR-162107Justin RitterCorporateNaNWollongongNew South WalesAustraliaOceaniaAsia PacificFUR-CH-5379FurnitureChairsNovimex Executive Leather Armchair, Black3709.39590.1-288.7650923.63Critical
225330IN-2014-CR127307-419292014-10-172014-10-18First ClassCR-127307Craig ReiterConsumerNaNBrisbaneQueenslandAustraliaOceaniaAsia PacificTEC-PH-5356TechnologyPhonesNokia Smart Phone, with Caller ID5175.17190.1919.9710915.49Medium
313524ES-2014-KM1637548-416672014-01-282014-01-30First ClassKM-1637548Katherine MurrayHome OfficeNaNBerlinBerlinGermanyWestern EuropeEuropeTEC-PH-5267TechnologyPhonesMotorola Smart Phone, Cordless2892.51050.1-96.5400910.16Medium
447221SG-2014-RH9495111-419482014-11-052014-11-06Same DayRH-9495111Rick HansenConsumerNaNDakarDakarSenegalWestern AfricaAfricaTEC-CO-6011TechnologyCopiersSharp Wireless Fax, High-Speed2832.96080.0311.5200903.04Critical
522732IN-2014-JM156557-418182014-06-282014-07-01Second ClassJM-156557Jim MitchumCorporateNaNSydneyNew South WalesAustraliaOceaniaAsia PacificTEC-PH-5842TechnologyPhonesSamsung Smart Phone, with Caller ID2862.67550.1763.2750897.35Critical
630570IN-2012-TS2134092-412192012-11-062012-11-08First ClassTS-2134092Toby SwindellConsumerNaNPoriruaWellingtonNew ZealandOceaniaAsia PacificFUR-CH-5378FurnitureChairsNovimex Executive Leather Armchair, Adjustable1822.08040.0564.8400894.77Critical
731192IN-2013-MB1808592-413782013-04-142013-04-18Standard ClassMB-1808592Mick BrownConsumerNaNHamiltonWaikatoNew ZealandOceaniaAsia PacificFUR-TA-3764FurnitureTablesChromcraft Conference Table, Fully Assembled5244.84060.0996.4800878.38High
840099CA-2014-AB10015140-419542014-11-112014-11-13First ClassAB-100151402Aaron BergmanConsumer73120.0Oklahoma CityOklahomaUnited StatesCentral USUSCAFUR-BO-5957FurnitureBookcasesSauder Facets Collection Library, Sky Alder Finish341.96020.054.713625.27High
936258CA-2012-AB10015140-409742012-03-062012-03-07First ClassAB-100151404Aaron BergmanConsumer98103.0SeattleWashingtonUnited StatesWestern USUSCAFUR-CH-4421FurnitureChairsGlobal Push Button Manager's Chair, Indigo48.71210.25.480111.13High

Last rows

Row IDOrder IDOrder DateShip DateShip ModeCustomer IDCustomer NameSegmentPostal CodeCityStateCountryRegionMarketProduct IDCategorySub-CategoryProduct NameSalesQuantityDiscountProfitShipping CostOrder Priority
5128035112CA-2014-ZD21925140-418292014-07-092014-07-09Same DayZD-219251408Zuschuss DonatelliConsumer32216.0JacksonvilleFloridaUnited StatesSouthern USUSCAOFF-PA-6474Office SuppliesPaperXerox 192115.98420.24.99502.010Medium
512816039MX-2015-HG1502518-421642015-06-092015-06-11First ClassHG-1502518Hunter GlantzConsumerNaNBragança PaulistaSão PauloBrazilSouth AmericaLATAMOFF-PA-4475Office SuppliesPaperGreen Bar Message Books, Multicolor84.00050.09.20001.019High
5128224175IN-2015-DB132707-422212015-08-052015-08-10Standard ClassDB-132707Deborah BrumfieldHome OfficeNaNTownsvilleQueenslandAustraliaOceaniaAsia PacificOFF-BI-3253Office SuppliesBindersAvery Binder, Economy58.05050.119.95001.010Medium
5128324105IN-2015-KH1633058-421542015-05-302015-05-30Same DayKH-1633058Katharine HarmsCorporateNaNLucknowUttar PradeshIndiaSouthern AsiaAsia PacificOFF-PA-4007Office SuppliesPaperEaton Parchment Paper, Premium26.94020.01.86001.010High
512849922MX-2013-KM1637593-416362013-12-282013-12-31First ClassKM-1637593Katherine MurrayHome OfficeNaNManaguaManaguaNicaraguaCentral AmericaLATAMOFF-PA-5876Office SuppliesPaperSanDisk Message Books, 8.5 x 1118.64010.08.00001.010Medium
5128529002IN-2015-KE1642066-421742015-06-192015-06-19Same DayKE-1642066Katrina EdelmanCorporateNaNKureHiroshimaJapanEastern AsiaAsia PacificOFF-FA-3072Office SuppliesFastenersAdvantus Thumb Tacks, 12 Pack65.10050.04.50001.010Medium
5128634337US-2014-ZD21925140-417652014-05-062014-05-10Standard ClassZD-219251408Zuschuss DonatelliConsumer37421.0ChattanoogaTennesseeUnited StatesSouthern USUSCAFUR-FU-4070FurnitureFurnishingsEldon Image Series Desk Accessories, Burgundy16.72050.23.34401.930High
5128731315CA-2012-ZD21925140-411472012-08-262012-08-31Second ClassZD-219251404Zuschuss DonatelliConsumer94109.0San FranciscoCaliforniaUnited StatesWestern USUSCAOFF-AR-5321Office SuppliesArtNewell 3418.56020.02.48241.580High
512889596MX-2013-RB1979518-413222013-02-172013-02-21Standard ClassRB-1979518Ross BairdHome OfficeNaNValinhosSão PauloBrazilSouth AmericaLATAMOFF-BI-2919Office SuppliesBindersAcco Index Tab, Economy13.44020.02.40001.003Medium
512896147MX-2013-MC1810093-414162013-05-222013-05-26Second ClassMC-1810093Mick CrebaggaConsumerNaNTipitapaManaguaNicaraguaCentral AmericaLATAMOFF-PA-3990Office SuppliesPaperEaton Computer Printout Paper, 8.5 x 1161.38030.01.80001.002High